perm filename NIH.PRO[1,LMM] blob sn#077060 filedate 1973-12-07 generic text, type T, neo UTF8
                         DRAFT                                                   
                     NIH PROPOSAL                                                
                                            Buchanan, Smith                      
                                            11/19/73                             
                                                                                 
      PREAMBLE                                                                   
                                                                                 
  I.  INTRODUCTION                                                               
                                                                                 
      A.  Objectives                                                             
      B.  Background and Rationale                                               
      C.  Relationship to SUMEX and the Genetics Research Center                 
                                                                                 
 II.  SPECIFIC AIMS                                                              
                                                                                 
III.  METHODS                                                                    
                                                                                 
 IV.  SIGNIFICANCE OF PROPOSED RESEARCH                                          
                                                                                 
  V.  FACILITIES & EQUIPMENT                                                     
                                                                                 
 VI.  ORGANIZATIONAL FRAMEWORK                                                   
                                                                                 
VII.  BIBLIOGRAPHY                                                               
                                                                                 
                                                                                 
                                                                                 
                                                                                 
PREAMBLE                                                                         
                                                                                 
This renewal application requests funds for continued support of                 
resource-related research and applications in the area of chemistry and          
artificial intelligence.  Its previous funding resulted in a                     
collaborative scientific effort, which has produced significant results.         
                                                                                 
Previous efforts were also subsidized by both the Advanced Research              
Projects Agency (ARPA) in areas of the computer science research, and            
by the National Aeronautics and Space Administration (NASA) for                  
instrumentation.  These additional funds have been or will soon be               
terminated:  to continue this research, part of the burden of support            
must be shifted in a way that more accurately reflects the true cost             
of supporting our research.  We have made every effort to reduce                 
the requested budget in ways which will not severely impact the                  
research and which reflect the most efficient use of existing                    
resources of talent, instrumentation and computer programs.                      
                                                                                 
Termination of funding would prejudice the utilization of existing               
resources and ongoing research in other areas of NIH interest (an                
estimated $300,000 of mass spectrometry laboratory facilities alone).            
                                                                                 
Our project is the only systematic effort currently underway in this             
country (to our knowledge) for computer assisted structure elucidation           
(there is presently an intensive program underway in Japan in the                
same area).  This situation may be contrasted with computer assisted             
organic synthesis, an area receiving considerable attention from                 
several research groups.  Our efforts could not be begun again from              
scratch without the expenditure of prohibitive amounts of money and              
wasteful duplication of effort.  These capabilities can be beneficially          
provided to a wider community via the SUMEX resource.  Research                  
involving the augmentation of human intellect by computer programs               
may dramatically effect the ways in which chemical research is done              
in the future.                                                                   
                                                                                 
The personnel associated with the project constitute a unique and                
valuable resource.  Over the past five years we have assembled at                
Stanford an energetic team of scientists with experience in many                 
various aspects of computer science and chemical structure elucidation,          
as well as an experienced and efficient technical support staff.  This           
intellectual resource is an important component of this proposal.                
Without such capable personnel, the proposal would not be feasible.              
Without the financial support from this proposal, this line of                   
collaborative research will have to be abandoned and the mass                    
spectrometry facility will have to be closed.                                    
                                                                                 
                                                                                 
I.  INTRODUCTION                                                                 
                                                                                 
Significant resources of instrumentation, computer programs and                  
people have been assembled through the support of various granting               
agencies, including the NIH in its current grant for resource-related            
research.  The research proposed in this renewal application will                
extend the capabilities of these resources and insure their operation            
in the service of other research.                                                
                                                                                 
A.  Objectives                                                                   
                                                                                 
In the past several years, this project has developed special facilities         
and experience for molecular structure elucidation using artificial              
intelligence (AI) programs and spectroscopic data derived primarily from         
mass spectrometry (MS).  This proposal requests support in order to:             
                                                                                 
1)  Develop a combined gas chromatography/high resolution mass                   
spectrometry (GC/HRMS) system that is reliable enough to be used                 
routinely.  When this system is developed, service will be available             
to the Stanford community and research collaborators and, if our                 
resources permit, to any scientist requesting assistance.                        
                                                                                 
2)  Apply advanced artificial intelligence techniques to the                     
scientific inference problems of molecular structure elucidation and             
theory formation from spectroscopic data.                                        
                                                                                 
3)  Investigate mixtures of biologically important compounds, for                
example, marine sterols, and compounds isolated from extracts of                 
human urine.  High resolution mass spectrometry and combined gas                 
chromatography-high resolution mass spectrometry are excellent                   
structure elucidation techniques for these problems, especially in               
conjunction with the artificial intelligence programs.  Where                    
possible, additional information from other spectroscopic techniques             
will also be used for structure elucidation.                                     
                                                                                 
-----------------------------------------------------------------------          
* "High resolution" is a misnomer in the sense that the basic function           
  of a high resolution mass spectrometer is to provide the capability            
  for accurate mass determinations, so that elemental compositions can           
  be assigned to each ion.  This capability can be achieved in some              
  cases even at "low" resolving powers.                                          
----------------------------------------------------------------------           
                                                                                 
B.  BACKGROUND AND RATIONALE                                                     
                                                                                 
1.  The Structure Elucidation Problem                                            
                                                                                 
     a)  The General Chemical Problem.  Analysis of molecular                    
structure (as opposed to synthesis) is one of the major activities in            
chemistry related research.  For the specific task of elucidating                
molecular structures, chemists utilize a mixture of information                  
derived from chemical procedures and spectroscopic techniques.  Each             
item of information, if not redundant or uninterpretable, contributes            
to the solution of the problem.  Chemists draw upon a tremendous body            
of specific knowledge about chemistry, molecular structure,                      
spectroscopic techniques, etc., in order to piece together this                  
information and infer the structure of molecules.  These features               
make the problem particularly well-suited for applications of the                
techniques of artificial intelligence to assist research workers                 
performing the task.                                                             
                                                                                 
     b)  DjerassiS"s Laboratory.  Professor Djerassi has been concerned          
with structure elucidation problems since the beginning of his                   
chemical research.  His activities at Stanford have been concerned               
heavily with the application of particular spectroscopic techniques              
to structural studies of biomedically important compounds.  These                
techniques include optical rotatory dispersion (ORD) and, more                   
recently, magnetic circular dichroism (MCD) (both of them supported              
initially by the NIH).  More recently he has been concerned with mass            
spectrometry because of the power of the technique, in terms of                  
specificity and sensitivity, as an analytical tool for structure                 
elucidation.                                                                     
                                                                                 
Although the technique of mass spectrometry may not be sufficient for            
all structure determination problems, it is a very powerful tool in              
areas where there exists a body of knowledge about the behavior of               
related molecules in the mass spectrometer.  Also when sample size is            
limited mass spectrometry may well be the only technique that can be             
utilized.  In both cases, the recent availability of high resolution             
mass spectrometers has made HRMS the technique of choice because of the          
greater specificity of empirical formulae rather than nominal masses for         
each ion.  On a parallel course, the technique of GC/MS, routinely               
available with low resolution mass spectrometers (GC/LRMS), has                  
revolutionized investigations wherever complex mixtures are encountered.         
All of the above considerations argue that an extension of mass                  
spectrometry at Stanford to provision of GC/HRMS on a routine basis              
would be the next logical step toward more powerful structure                    
elucidation for researchers depending on this facility.  This system,            
applied to complex mixtures, will produce empirical formulae of all ions         
in the spectra of the mixtures.  It is also expected that the data from          
mass spectrometry would provide the most powerful input in many cases to         
the AI programs assisting in the analysis, prior to consideration of             
other types of spectroscopic information.                                        
                                                                                 
2.  Historical Background                                                        
                                                                                 
     a)  Mass Spectrometry Laboratory.  Prior to the existing DENDRAL            
grant, the groundwork was laid for computerization of the existing               
mass spectrometers, an Associated Electrical Industries MS-9 high                
resolution mass spectrometer and an Atlas CH-4 low resolution mass               
spectrometer.  This work, supported primarily by NASA via the                    
Instrumentation Research Laboratory (IRL) in the Department of Genetics,         
resulted in link-up to the then existing ACME computer facility via a            
PDP-11 mini-computer which acted as a buffer between the spectrometers           
and ACME.  Initial data acquisition and reduction programs were                  
written for the system and utilized on a limited basis.  The funding             
of the DENDRAL proposal in conjunction with additional resources                 
provided by the IRL resulted in a major effort to upgrade these                  
capabilities and to link the new mass spectrometer to the system.  The           
fruits of these efforts are described under section II.B.3 (below).              
                                                                                 
     B)  Summary of Early DENDRAL Development.                                   
In 1964, Lederberg devised a notational algorithm for chemical                   
structures (termed DENDRAL) that allowed questions of molecular                  
structure to be framed in precise graph-theoretic terms.  He also                
showed how to use the DENDRAL algorithm to generate complete and                 
irredundant lists of structural isomers.                                         
                                                                                 
In 1965-66 Lederberg and Feigenbaum began exploring the idea of                  
using the isomer generator in an artificial intelligence program -               
searching the space of possible structures for plausible solutions               
to a problem much as a chess-playing program searches the space of               
legal moves for the best moves.  This approach guarantees that every             
possible solution to a problem is considered - either implicitly, as             
when whole classes of unstable structures are rejected, or                       
explicitly, as when a complete molecule is tested for plausibility.              
In either case, an investigator easily determines the criteria for               
rejection and acceptance and knows that no possibilities have been               
forgotten.  This approach also guarantees that structures appear in              
the list only once - that symmetric representations of the same                  
complex molecule have not been included.  In both these respects the             
computer program has an advantage over manual approaches to structure            
elucidation.                                                                     
                                                                                 
     C)  Initial collaboration with Djerassi.  Lederberg and Feigenbaum          
realized that (a) only through application to real problems could the            
worth of the AI approach be evaluated, and (b) mass spectrometry                 
appeared to be a fruitful applications area.  Mass spectrometry appeared         
to be an excellent problem area because of the close relationship                
between spectral fragmentation patterns and molecular structure for many         
classes of molecules.  DjerassiS"s interest and expertise led to a series        
of publications describing the approach and initial results of the               
programs.  The success of these collaborative efforts led to the                 
proposal to the NIH for initial funding to extend these efforts.                 
                                                                                 
     d)  Efforts Under NIH Funding.  The initial funding by NIH                  
provided the opportunity to upgrade the instrumentation and computer             
programs.  In particular we were able to mount a concerted project               
on both the analysis of mass spectra and the mathematical aspects of             
molecular structure.  Progress reports to the NIH describe this                  
research in detail.  The most recent annual report appears in                    
Appendix ***.  A series of publications directed to audiences both in            
computer science and chemistry are listed in bibliography Z.  The                
following section (Section 3) summarizes the capabilities for                    
structure elucidation which, in themselves, constitute an important              
result of past work.                                                             
                                                                                 
An important side effect of the DENDRAL project is the extent to which           
additional research was inspired and carried out to fill gaps in                 
existing knowledge.  This research, not supported by the DENDRAL grant,          
has been beneficial to on-going DENDRAL work, and vice-versa.                    
Publications which have arisen from this research are listed in                  
bibliography.  A brief review of these publications should indicate              
the need for precise specification of the knowledge elicited from                
chemists and used in computer programs.  As an example, consider the             
description and application of an early algorithm for generation of              
cyclic structural isomers (Y.M. Sheikh, et.al., 1970).  This paper               
considered the problem of spectroscopic differentiation of isomers of            
C6H10O.  Unsaturated ethers are one of the classes of isomeric compounds         
which must be considered, but the mass spectrometry of unsaturated               
ethers had not been investigated systematically.  This work was                  
subsequently carried out in Professor DjerassiS"s laboratory independent         
of DENDRAL support, but of benefit to DENDRAL (Morizur and Djerassi,             
1971).  Other examples will be found in Bibliography.                            
                                                                                 
3.  Existing Capabilities                                                        
                                                                                 
This research team has already developed unusual capabilities for                
chemical structure elucidation, bringing together a high quality                 
HRMS system and AI programs applied to chemistry.  We have demonstrated          
the feasibility of our analytical approach in several problem areas,             
and have developed both a mass spectrometry system and a general set             
of computer programs for use in new areas.                                       
                                                                                 
The most outstanding capabilities are summarized below, followed by              
brief discussions of each.  These are available immediately, and                 
were developed primarily under NIH funding to this project, with                 
additional support supplied by ARPA and NASA in specific areas.  (These          
agencies have reduced funding levels for this work, however, leaving             
the NIH as the source of support for future development of applications          
programs in the area of artificial intelligence and chemistry.)                  
                                                                                 
     a.  High Resolution Mass Spectrometry System and Coupled Gas                
Chromatography/Low Resolution Mass Spectrometry System.  We have coupled         
the NIH-supported Varian-MAT 711 High Resolution Mass Spectrometer with          
a Hewlett Packard Gas Chromatograph and demonstrated its utility.                
Advanced data reduction techniques for this instrument exist in the              
dedicated PDP 11/20 and StanfordS"s 370/158.                                     
                                                                                 
     b.  DENDRAL Structure Generator                                             
The DENDRAL Structure Generator is a unique computer program capable of          
exhaustive and irredundant generation of isomers, with and without               
rings.  This program is the "legal move generator" that guarantees               
consideration of every candidate structure - either implicitly, as when          
whole classes of structures are forbidden, or explicitly, as when                
individual compounds in a class are specified.  A labelling algorithm,           
which is essential to structure generation, is capable of producing              
answers to many structural questions.  For example, it can list all              
structures resulting from substituting a carbo-cyclic skeleton with some         
numbers of different groups.                                                     
                                                                                 
     c.  DENDRAL Planner                                                         
We have written a set of computer programs for determining structural            
features from analytic data in well-defined areas.  Such planning                
programs have been written for low and high resolution mass                      
spectrometry, interpreted proton NMR spectroscopy and 13CMR data.                
                                                                                 
     d.  INTSUM                                                                  
INTSUM is a computer program that aids in finding interpretive rules for         
mass spectrometry.  The program interprets a large collection of mass            
spectrometry data according to criteria specified by a chemist.  Then it         
summarizes the data to show which of the possible interpretations seem           
most plausible.                                                                  
                                                                                 
     e.  Ancillary Techniques                                                    
1.  The mass spectrometry facility provides other types of experiments           
in mass spectrometry, including ultra-high resolution measurements               
(masses determined via peak matching), metastable ion determinations             
(Barber-Elliott technique) and low ionizing voltage experiments.                 
These data are utilized by both chemists and programs where appropriate.         
2.  Additional computer programs provide added problem-solving assistance.       
  a.  Predictor program for predicting major features of mass spectra.           
  b.  Programs for drawing and displaying chemical structures.                   
  c.  Subroutines developed in conjunction with or existing as parts of          
the Structure Generator for problems of partitioning, construction of            
vertex-graphs, and constructive graph labelling.  These can be applied           
to answer certain questions of isomerism which do not require the                
complete generator.  For example, the labelling algorithm can list               
all structures resulting from substituting a carbocyclic skeleton with           
some numbers of different functional groups.                                     
                                                                                 
     f.  Other Spectroscopic Techniques                                          
Available to us are the spectroscopic facilities of Professor DjerassiS"s        
laboratory for work requiring additional spectroscopic data.  Also               
available on a fee for service basis are the extensive spectroscopic             
facilities of the chemistry department.  These would be utilized for             
collecting of additional data on particular structure problems and               
gathering data on known compounds (particularly in the area of                   
13CMR) as the AI programs become knowledgable about other spectroscopic          
information.                                                                     
                                                                                 
     g.  Chemical Facilities                                                     
We possess, in Professor DjerassiS"s chemical laboratories,                      
substantial synthetic capabilities and general chemical know-how.                
This resource can be called upon to provide assistance in synthesis              
of model or labelled compounds, derivitization of mixtures, and so               
forth.  As an example of how extensive use of these facilities has               
been accomplished in the past, a graduate student is presently                   
engaged in thesis research dealing with synthesis of a new estrogen              
metabolite strongly suspected to be a component of certain pregnancy             
urines.                                                                          
                                                                                 
4.  User Community.                                                              
We feel that the maximum economic utilization of existing facilities,            
and those proposed, can be realized by sharing them with a community of          
users.  Without additional funds for a major service facility, this              
community will emphasize the following groups, but will be informally            
available to others.                                                             
     A.  Stanford Community                                                      
          i)  Stanford Chemistry Department                                      
               Djerassi - Steroids, marine sterols                               
                                                                                 
                          (list after questionnaire return)                      
                                                                                 
         ii)  Stanford Medical School Collaborators                              
                                                                                 
                          (list after questionnaire return)                      
                                                                                 
     B.  Extramural Users.                                                       
The development of the techniques of ORD, MS and MCD at Stanford has             
been paralleled with extensive sharing of these resources nation- and            
world-wide in collaborative research efforts, without any additional             
funding.  Rather than provide simple service, experience has shown that          
use of some discretion in selection of problems results in better                
utilization of the people and instrumentation involved.  We would extend         
this provision of services to include available computer programs where          
appropriate along the lines of our successful collaboration with                 
Professor Adlercreutz, University of Helsinki.                                   
                                                                                 
II.  SPECIFIC AIMS                                                               
                                                                                 
     1.  Develop routine GC/HRMS techniques of utility to                        
biomedical scientists with structure elucidation problems.  Prototype            
GC/HRMS systems have been developed at Stanford and elsewhere, but this          
type of facility (in contrast to GC/LRMS) does not seem to be routinely          
available.  Although we wish to give our computer programs (see Aim 2)           
the flexibility to deal with other analytic data, our own efforts on             
instrumentation will be centered on GC/HRMS, for reasons explained in            
Section (I).                                                                     
                                                                                 
     2.  Develop new computer programs, and improve existing ones, for           
assisting analytical chemists with structure elucidation problems and            
theory formation.  Computer programs have already been written for               
analysis of low and high resolution mass spectra for generation of               
acyclic and cyclic molecular structures, for labelling structural                
skeletons with atoms, for analyzing C13 NMR spectra of amines and for            
interpretation and summary of large volumes of data gathered on model            
compounds.  We wish to increase the utility of these routines by                 
providing interactive programs that allow easier access to the programs,         
by increasing their generality and power, and by supplementing them with         
new reasoning programs.                                                          
                                                                                 
     3.  Apply the structure elucidation techniques - both                       
instrumentation and computer programs - to biomedically relevant                 
compounds.                                                                       
                                                                                 
                                                                                 
                                                                                 
III.  METHODS                                                                    
                                                                                 
Chemical structure elucidation requires the intelligent and patient              
application of a large body of chemical knowledge to each specific               
problem.  Because of the importance and relative difficulty of the               
problem, we believe computer programs can provide powerful assistance            
to chemists in their analyses.  It is unlikely that such programs will           
ever replace chemists, in part because computer programs are written             
to focus on rather narrow aspects of problems.  But it is reasonable             
to view our past research as a demonstration of the computerS"s ability          
to assist chemists although this was a spinoff from theoretically                
oriented research.  We wish to stress that our present aim is to                 
provide assistance for structure elucidation problems.                           
                                                                                 
In order to meet the major objectives of this proposal we will focus             
our attention primarily on structure elucidation through mass                    
spectrometry and artificial intelligence.  However, many of the                  
computer programs can already use information from other analytical              
techniques.  So we want to be able to think of structure elucidation             
in the context of an ensemble of analytic capabilities.                          
                                                                                 
The specific aims enumerated in Section (II) will be pursued in the              
highly inter-disciplinary manner that has characterized the DENDRAL              
project under NIH support.  The aims are not separate aims at all,               
but are interactive and dependent upon each other.  For example, we              
feel that the power of mass spectrometry and, potentially, other                 
spectroscopic techniques, can be enhanced by the use of computer                 
programs to perform various aspects of structure elucidation.  From              
the standpoint of computer science, one measure of the utility of                
techniques of artificial intelligence is how well they perform in                
real-world applications.  We have focused our interest on AI                     
programs for structure elucidation and the related area of theory                
formation, primarily in mass spectrometry.  It is necessary in the               
development of these programs to have a source of data and personnel             
able to criticize methods and results.                                           
                                                                                 
A)  Development of Routine GC/HRMS Facility                                      
                                                                                 
We have developed a significant resource consisting of                           
instrumentation (the Varian MAT-711 and ancillary equipment) and                 
computer programs for instrument evaluation and data acquisition and             
reduction.  Routine reduction of high resolution mass spectra to                 
elemental compositions and ion abundances without human intervention             
provides the capability for efficient handling of large volumes of high          
resolution mass spectra.  The development of the gas chromatography              
and of the GC/MS combination is in the excellent hands of Ms. Wegmann,           
formerly head of Hewlett-PackardS"s gas chromatography applications              
laboratory.  She is responsible for operation of the complete system.            
We now have more than two years of operational experience with the               
mass spectrometer, the gas chromatograph and related equipment under             
a wide variety of experimental conditions.                                       
                                                                                 
The biomedical community (see User Community) desiring access to our             
facilities for structure elucidation have a variety of problems, some            
of which can be solved by existing instrumentation and computer                  
techniques.  However, many problems consist of complex mixtures of               
compounds where analysis by conventional GC/LRMS does not lead to                
unambiguous solutions and separation of components on a preparative              
scale for other spectroscopic analysis is difficult.  These problems             
are amenable to attack by a system comprised of a GC/HRMS combination,           
the GC, providing separation, coupled with the mass spectrometer                 
operating at high resolution to provide highly specific information.             
Thus, upgrading of our current system so that GC/HRMS data can be                
provided on a routine basis is a desireable, and we believe necessary,           
step to solve many of these problems.                                            
                                                                                 
We were able to perform some preliminary experiments to evaluate the             
feasibility of operating a GC/MS system at high mass spectrometer                
resolving powers.  These experiments were hampered somewhat by the               
limitations of the computer system used to acquire the data (only                
occasional, single scans were possible) and were discontinued as was             
all HRMS operation on the termination of the ACME computer facility.             
We do have, however, some benchmark figures on which to evaluate the             
level of performance of the proposed system.  We were able to obtain             
good quality mass measurements over a dynamic range of 100:1 for                 
sample sizes of the order of 0.5-1.0 micrograms/component during 8               
sec/decade in mass scans (resolving powers 5-8,000).                             
                                                                                 
We propose to operate our existing GC/MS system under high resolution            
conditions aiming toward optimization of resolving powers, scan rates            
and GC and molecular separator operating conditions to determine the             
maximum usable sensitivity of the system.                                        
                                                                                 
We recognize that the ultimate sensitivity will not approach that                
attainable by photographic methods of recording; we feel that the                
ability for real-time operation and evaluation of the operating                  
conditions of the mass spectrometer partially offsets the sensitivity            
disadvantages.  We realize that some structure elucidation problems              
will not be amenable to study because of the sensitivity limitations;            
we feel, however, that many problems of interest to the User                     
Community can be studied effectively with this performance capability.           
Rather than propose a research program to increase the sensitivity of            
high resolution mass spectrometers (e.g., McLafferty, et.al., dynamic            
rescanning of peaks; JPL - photon emission/detector arrays), we                  
propose to identify our limitations and, with our collaborators, use             
discretion in selecting and preparing samples.                                   
                                                                                 
Any HRMS system requires computer support; our proposed GC/HRMS                  
facility requires a significant amount of support to process mass                
spectral data in a reasonable length of time.  There are several                 
options which might be pursued to obtain this support.  They are                 
described in detail in the accompanying budget justification.                    
                                                                                 
B)  Computer Assisted Structure Elucidation                                      
                                                                                 
As mentioned in Section A above, the Planner program can be used                 
immediately for structure elucidation problems using mass spectrometry           
data.  The program has been described in detail elsewhere and is                 
mentioned in the section on existing capabilities.  Its performance is           
excellent precisely in the areas where mass spectrometry, by itself, is          
capable of definitive structure analysis.  Where the history of a sample         
is known, so that potential classes of compounds are restricted, and             
where the rules of mass spectrometric fragmentation are known in detail          
for the classes, the performance of both chemists and the program are            
excellent, although the program offers some advantages in its exhaustive         
and rapid analysis of the data.  Many structure elucidation problems of          
the user community fit into this category and existing resources can             
fulfill these needs.                                                             
                                                                                 
Mass spectrometry cannot solve all structure elucidation problems,               
however.  In such cases, the chemist turns to other spectroscopic                
techniques if sample size permits.  As described in the introductory             
section, the chemist pieces diverse information together to achieve a            
solution.  Interactive computer programs can assist the chemist in               
this procedure, with the advantages of exhaustive evaluation of the              
data and the molecular structures suggested by these data.                       
                                                                                 
In our own work and in that work planned with colloborators we can               
call upon the extensive facilities of the chemistry department for               
acquisition of additional spectroscopic data to assist in the                    
application of the software systems to real problems.  These                     
facilities are on a fee for service basis, such fees presumably being            
paid from existing research grants of the user community.  There are             
sufficient literature examples of structure elucidation problems to              
obviate the requirement for extensive use of these additional                    
facilities in development of the programs.                                       
                                                                                 
We propose to develop these software facilities in the following way:            
                                                                                 
1)  The recently completed structure generator will be the core of our           
efforts to assist chemists in structure elucidation.  The structure              
generator can guarantee that the correct solution is somewhere in the            
list of possibilities.  Additional programs, such as the Planner allow           
us to avoid exhaustive generation in practice.  Some parts of this               
program have not been extensively tested yet, and these tests will be            
the first task to complete.                                                      
                                                                                 
2)  The SUMEX resource will provide the capability for development               
of an interactive system and also provide the mechanism by which                 
others can gain access to the programs as they are developed.                    
                                                                                 
3)  The structure elucidation task as carried out by chemists is                 
strongly directed toward rejection of whole categories (e.g.,                    
compound classes) of solutions as quickly as possible by using as                
much knowledge of chemical history or characteristics as is available.           
Details of spectroscopic data are then used to define the molecular              
framework more precisely.  Each step in this procedure represents the            
application of constraints on the set of possible solutions which                
must be considered.  Computational efficiency demands that these                 
constraints be applied early in the generation process when the                  
structure generator is utilized.                                                 
                                                                                 
We have made some effort to examine the kinds of constraints used by             
chemists engaged in structure elucidation.  We have begun designing              
strategies so that these constraints can be brought to bear on the               
structure generator.  Some of these strategies involve minor changes             
to the existing program; others require significant extensions of                
existing generating functions.  In either case, we propose to continue           
these investigations so that a reasonable variety of constraints can             
be recognized and utilized effectively by a computer program.  This              
represents the first steps toward increasing the chemical knowledge              
of a program which views chemical structures and their manipulation              
as mathematical entities and transforms.                                         
                                                                                 
4)  Present, effective use of the structure generator or its subroutines         
for special problems requires a detailed knowledge of the program.  We           
propose to develop an interface between chemists and the program to              
remove this requirement.  The interface would contain elements of                
structure input and display routines and a simple language for                   
application of constraints.  Portions of these elements are available            
from other workers (e.g., Richard Feldman, NIH) and we would draw on             
these sources whenever possible.                                                 
                                                                                 
5)  We propose that initial efforts will be directed toward a system             
where the chemist examines his own data and inputs his findings (in              
terms of allowed and disallowed structural features) to the program              
as constraints.  The generator would then provide a list of possible             
solutions which can be evaluated by the chemist, who can then iterate            
on this procedure.                                                               
                                                                                 
6)  There are three feasible, but longer term, extensions of this                
approach which we feel are potentially very valuable.  We propose                
to begin at least preliminary investigations on the following:                   
                                                                                 
     a)  We have the capability now for automatic interpretation of              
mass spectral data.  Results of this interpretation can be applied               
directly to a structure generator.  Similar Planners could be written            
for automatic analysis of data from other spectroscopic techniques, as           
we have illustrated for 13CMR.                                                   
                                                                                 
     b)  A program with detailed knowledge about information                     
obtainable from various spectroscopic techniques could examine a list            
of candidate solutions and propose experiments necessary and                     
sufficient to distinguish among them.  We have illustrated this                  
capability previously in the area of mass spectrometry using the                 
Predictor.                                                                       
                                                                                 
     c)  The structure generatorS"s view of chemistry is two                     
dimensional and presently unconstrained by such ideas as bond lengths            
and angles, steric hindrance, and so forth.  Because stereochemical              
considerations are extremely important in structure elucidation, we              
propose to begin consideration of stereochemistry in the structure               
generation process.  Lederberg has previously discussed ways in which            
three diminsional information can be considered in the generation and            
representation of molecular structures.  More recently, the work of              
Wipke in connection with computer assisted organic synthesis has                 
provided important results which we would attempt to utilize to                  
avoid unnecessary duplication of effort.                                         
                                                                                 
                                                                                 
C.  Theory Formation                                                             
                                                                                 
One important aim of this work is to improve the existing theory                 
formation capabilities and thus provide more assistance to scientists            
investigating regularities within classes of compounds.  This is a               
theory formation task at a very pragmatic level.  The mass spectrometry          
theory that the program attempts to find is of the same form as the one          
practicing mass spectroscopists use for structure elucidation.  Thus,            
resulting pieces of theory are extensions to both the scientistsS" theory        
and the computerS"s theory of the discipline.  To improve this program we        
need to complete the Plan-Generate-Test program that has been started            
(as described in the appended annual report) and tune it over many test          
cases.  We also wish to make the programs interactive and easy to use so         
that they are more readily accessible.  This can be done when the                
programs are transferred to the SUMEX facility.                                  
                                                                                 
We plan to apply the theory formation program to two different kinds             
of data:  (a) the data collected in the interest of understanding                
the mass spectrometry of a particular class of compounds, such as                
estrogenic steroids, and (b) collections of diverse data that may                
provide some insight into more general fragmentation mechanisms.  For            
example, by studying the mass spectra of monofunctional compounds we             
would hope to find rules that lead to a better understanding of more             
complex compounds.                                                               
                                                                                 
The INTSUM program mentioned in Section (I) is the planning phase of             
the theory formation program.  It currently runs in batch mode on                
StanfordS"s 360/67 computer.  We wish to add an interactive monitor              
to INTSUM to give an investigator the ability to set up his own                  
conditions for interpreting the mass spectra and to control the type             
of summary he wishes to see.  For example, if he is interested in the            
allowable hydrogen transfers associated with one specific process the            
program could be instructed to produce a very specific summary.  Also,           
we wish to add an interactive program for answering questions about              
the results.  For example, an investigator should be able to find out            
easily how many processes involve cleavage of a specific bond and how            
strong their resulting mass spectral peaks are.                                  
                                                                                 
The INTSUM program is now used routinely by chemists engaged in                  
investigations of the mass spectrometric fragmentation of various                
classes of organic compounds, primarily steroids.  A manuscript is now           
in preparation (Hammerum and Djerassi, 1973) describing the                      
fragmentation of progesterone and related compounds.  The program was            
used extensively in this work.  We are now beginning a detailed                  
examination of the fragmentation of steroids related to the androstane           
skeleton, particularly testosterones.  We propose to continue to use             
the INTSUM program in its present form and as it is improved in                  
support of these studies.                                                        
                                                                                 
The generator of rules that we now have does a credible job of                   
explaining the regularities summarized by INTSUM.  It has found, for             
example, the well-known alpha-cleavage fragmentation process and beta            
cleavage followed by rearrangement in the low resolution data for                
fifteen aliphatic amines.  The program will be extended in two important         
ways to increase its utility:  (i) the program needs to be able to work          
with an increased number of descriptive predicates in the generation of          
rules, and (ii) it needs to be given a more flexible representation of           
complex fragmentation mechanisms so that it can find rules involving             
more than two bonds.                                                             
                                                                                 
It will also be desirable to provide interactive programs for the                
investigator to query the rule generation program.  For example, many            
questions now arise about the programS"s inference steps to the rules            
it suggests as explanations of the regularities.  Why, for example,              
was some particular rule not considered plausible?                               
                                                                                 
The test phase of the theory formation program remains to be written.            
It will verify the rules by testing them against new data - preferably           
against results of carefully selected new experiments.  It will                  
modify or delete rules on the basis of counter examples.  It will                
also have to design so-called "crucial experiments" that allow                   
differentiation among competing rules.                                           
                                                                                 
     D)  Applications to Biomedically Relevant Compounds                         
                                                                                 
We can immediately offer to the user community the Planner,  for                 
analysis of high resolution mass spectra in terms of molecular                   
structure.  The program is, of course, insensitive to the source of              
the mass spectral data, and we foresee significant use of the program            
for analysis of spectra from the GC/HRMS facility without additional             
programming effort.                                                              
                                                                                 
We foresee that the GC/HRMS facility will be used in studies of the              
following nature:                                                                
                                                                                 
1)  Djerassi - marine sterols - isolation and characterization of                
mixtures of sterols from marine organisms.  This research is supported           
by the NIH.  This work is presently carried out by GC/LRMS techniques            
and isolation of milligram quantities of individual sterols by TLC or            
GC for further characterization.  Although high resolution mass                  
spectra alone may not be sufficient for structural characterization,             
the extra information may be crucial, particularly for minor                     
components where isolation of larger quantities of material is                   
difficult or impossible.  If larger amounts of material are available,           
the proposed computer program development (part B, below) will also              
be of assistance in analysis of additional spectroscopic data.  These            
arguments, of course, hold true for other areas of interest outlined             
below.                                                                           
                                                                                 
2)  Djerassi - hormonal steroids - we plan on continued collaboration            
with Professor Adlercreutz on analysis of estrogen mixtures; GC/HRMS             
might be very effective in future collaboration with Adlercreutz in a            
variety of related areas.  Present work on improvement of mass                   
spectrometry theory for other classes of steroids (to be used in the             
Planner) is being carried out by postdoctoral fellows who provide                
their own financial support.                                                     
                                                                                 
3)  Genetics Center - the screening activities of the Genetics Center            
research use exclusively GC/LRMS techniques.  Past experience has                
shown the difficulties in identification of unknown components in                
extracts of human urine by LRMS data alone.  We plan, through                    
collaborative efforts, to provide necessary GC/HRMS data where                   
required, and to use our computer programs to assist in these studies.           
                                                                                 
4)  and following - questionnaire results                                        
    ***                                                                          
                                                                                 
                                                                                 
IV.  SIGNIFICANCE OF PROPOSED RESEARCH                                           
                                                                                 
Structure elucidation is an important and difficult problem for                  
scientists in a biomedical community.  This research aims at providing           
more powerful techniques for determining molecular structures than are           
now routinely available.  In particular, we have proposed (a) developing         
routine GC/HRMS instrumentation as a means of collecting powerful                
analytic data for scientists; (b) developing (and extending)                     
sophisticated computer programs to assist with the interpretation of the         
data from mass spectrometry and elsewhere, (c) developing (and                   
extending) novel computer programs to assist with formulation of the             
rules of interpretation, and (d) applying these state of the art                 
techniques to problems of biomedical relevance.  No other research group         
can claim such a broad-based attack on the problems of structure                 
elucidation.                                                                     
                                                                                 
The proposed research not only holds promise for significant long-term           
advances, it can have immediate benefits as well.  Many members of the           
biomedical community at Stanford have called upon the mass spectrometry          
laboratory for assistance in the past and will continue to do so in the          
future.  HRMS is an important source of data for these problems, and             
GC/HRMS is still more important.  Previous investment by the NIH in the          
Varian MAT-711 HRMS system at Stanford can be utilized now and built             
upon for the future.  Continued operation of the mass spectrometer will          
give the Stanford community access to state-of-the-art spectroscopic             
techniques and to professional mass spectroscopists who can help with            
ongoing problems.                                                                
                                                                                 
The computer programs themselves constitute a unique resource for                
assisting with the structure determination.  The previous NIH grant              
supported development of the programs.  In part, we are requesting               
funds to exploit these programs.                                                 
                                                                                 
One of the most significant aspects of this work is its                          
interdisciplinary view of solving chemical structure problems by                 
searching the space of chemical graph structures.  As a result of posing         
the structure determination problem in this framework, we have been able         
to further the systematization of chemistry in at least three ways.              
First, the knowledge of chemistry used by analytic chemists has been             
made more precise for use in a computer program.  Second, codifying such         
knowledge for the computer has led to the discovery of new research              
areas to extend our existing knowledge of chemistry.  Several                    
publications listed in the bibliography (Refs. 42 and following) are             
reports of exactly this kind of research.  Finally, the computerS"s              
search through the space of possible structures gives the practicing             
scientist the confidence that no structures were merely overlooked -             
many whole classes may not have been explicitly enumerated, but that is          
because the computer rejected those classes using precise criteria that          
are themselves explicit.  For this last reason, the computer program has         
the potential to augment a chemistS"s reasoning power in a way that has          
never before been possible.                                                      
                                                                                 
                                                                                 
V.  FACILITIES & EQUIPMENT                                                       
                                                                                 
The Stanford Mass Spectrometry Laboratory will provide GC/HRMS on                
the Varian MAT-711 mass spectrometer coupled with a Hewlett-Packard              
gas chromatograph.  As service instruments for more routine mass                 
spectral analyses, the laboratory has a MS-9 and CH-4 mass                       
spectrometers.  A PDP 11/20 computer with one disk drive currently               
provides the only dedicated data-reduction capability in the                     
laboratory.                                                                      
                                                                                 
Data reduction beyond the minimal capability of the PDP 11/20 can be             
provided on StanfordS"s IBM 370/158 computer.  (The PDP 11/30 presently has      
only the capability for buffering peak profile data between the mass             
spectrometer and the IBM 370/158 computer at the Stanford Computer               
Center.) An alternative to buying time on the 370/158 is proposed and            
discussed in the budget justification.                                           
                                                                                 
The artificial intelligence programs will be run on the NIH-sponsored            
SUMEX computer facility (a PDP-10 computer with the TENEX operating              
system, 192K words of memory, and more than adequate peripherals).               
Running these programs on SUMEX will incur no charge.                            
                                                                                 
                                                                                 
                                                                                 
                                                                                 
DENDRAL PUBLICATIONS                                                             
                                                                                 
                                                                                 
 (1) J. Lederberg, "DENDRAL-64 - A System for Computer                           
Construction, Enumeration and Notation of Organic Molecules as                   
Tree Structures and Cyclic Graphs", (technical reports to NASA,                  
also available from the author and summarized in (12)).                          
    (1a) Part I. Notational algorithm for tree                                   
         structures (1964) CR.57029                                              
    (1b) Part II. Topology of cyclic graphs (1965) CR.68898                      
    (1c) Part III. Complete chemical graphs;                                     
         embedding rings in trees (1969)                                         
                                                                                 
 (2) J. Lederberg, "Computation of Molecular Formulas for Mass                   
Spectrometry", Holden-Day, Inc. (1964).                                          
                                                                                 
 (3) J. Lederberg, "Topological Mapping of Organic Molecules",                   
Proc. Nat. Acad. Sci., 53:1, January 1965, pp.  134-139.                         
                                                                                 
 (4) J. Lederberg, "Systematics of organic molecules, graph                      
topology and Hamilton circuits.  A general outline of the DENDRAL                
system."  NASA CR-48899 (1965)                                                   
                                                                                 
 (5) J. Lederberg, "Hamilton Circuits of Convex Trivalent                        
Polyhedra (up to 18 vertices), Am. Math. Monthly, May 1967.                      
                                                                                 
 (6) G. L. Sutherland, "DENDRAL - A Computer Program for                         
Generating and Filtering Chemical Structures", Stanford Artificial               
Intelligence Project Memo No. 49, February 1967.                                 
                                                                                 
 (7) J. Lederberg and E. A. Feigenbaum, "Mechanization of                        
Inductive Inference in Organic Chemistry", in B. Kleinmuntz (ed)                 
Formal Representations for Human Judgment, (Wiley, 1968) (also                   
Stanford Artificial Intelligence Project Memo No. 54, August                     
1967).                                                                           
                                                                                 
 (8) J. Lederberg, "Online computation of molecular formulas from                
mass number."  NASA CR-94977 (1968)                                              
                                                                                 
 (9) E. A. Feigenbaum and B. G. Buchanan, "Heuristic DENDRAL: A Program          
for Generating Explanatory Hypotheses in Organic Chemistry", in                  
Proceedings, Hawaii International Conference on System Sciences,                 
B. K. Kinariwala and F. F. Kuo (eds), University of Hawaii Press,                
1968.                                                                            
                                                                                 
 (10) B. G. Buchanan, G. L. Sutherland, and E. A.  Feigenbaum,                   
"Heuristic DENDRAL: A Program for Generating Explanatory                         
Hypotheses in Organic Chemistry".  In Machine Intelligence 4 (B.                 
Meltzer and D. Michie, eds) Edinburgh University Press (1969),                   
(also Stanford Artificial Intelligence Project Memo No. 62, July                 
1968).                                                                           
                                                                                 
 (11) E. A. Feigenbaum, "Artificial Intelligence: Themes in the                  
Second Decade".  In Final Supplement to Proceedings of the IFIP68                
International Congress, Edinburgh, August 1968 (also Stanford                    
Artificial Intelligence Project Memo No. 67, August 1968).                       
                                                                                 
 (12) J. Lederberg, "Topology of Molecules", in The Mathematical                 
Sciences - A Collection of Essays, (ed.) Committee on Support of                 
Research in the Mathematical Sciences (COSRIMS), National Academy                
of Sciences - National Research Council, M.I.T. Press, (1969), pp.               
37-51.                                                                           
                                                                                 
 (13) G. Sutherland, "Heuristic DENDRAL: A Family of LISP                        
Programs", to appear in D. Bobrow (ed), LISP Applications (also                  
Stanford Artificial Intelligence Project Memo No. 80, March 1969).               
                                                                                 
 (14) J. Lederberg, G. L. Sutherland, B. G. Buchanan, E.  A.                     
Feigenbaum, A. V. Robertson, A. M. Duffield, and C.  Djerassi,                   
"Applications of Artificial Intelligence for Chemical Inference I.               
The Number of Possible Organic Compounds: Acyclic Structures                     
Containing C, H, O and N".  Journal of the American Chemical                     
Society, 91:11 (May 21, 1969).                                                   
                                                                                 
 (15) A. M. Duffield, A. V. Robertson, C. Djerassi, B.  G.                       
Buchanan, G. L. Sutherland, E. A. Feigenbaum, and J.  Lederberg,                 
"Application of Artificial Intelligence for Chemical Inference II.               
Interpretation of Low Resolution Mass Spectra of Ketones".                       
Journal of the American Chemical Society, 91:11 (May 21, 1969).                  
                                                                                 
 (16) B. G. Buchanan, G. L. Sutherland, E. A.  Feigenbaum, "Toward               
an Understanding of Information Processes of Scientific Inference                
in the Context of Organic Chemistry", in Machine Intelligence 5,                 
(B.  Meltzer and D. Michie, eds) Edinburgh University Press                      
(1970), (also Stanford Artificial Intelligence Project Memo No.                  
99, September 1969).                                                             
                                                                                 
 (17) J. Lederberg, G. L. Sutherland, B. G. Buchanan, and E.  A.                 
Feigenbaum, "A Heuristic Program for Solving a Scientific                        
Inference Problem: Summary of Motivation and Implementation",                    
Stanford Artificial Intelligence Project Memo No. 104, November                  
1969.                                                                            
                                                                                 
 (18) C. W. Churchman and B. G. Buchanan, "On the Design of                      
Inductive Systems: Some Philosophical Problems".  British Journal                
for the Philosophy of Science, 20 (1969), pp.  311-323.                          
                                                                                 
 (19) G. Schroll, A. M. Duffield, C. Djerassi, B. G.  Buchanan, G.               
L. Sutherland, E. A. Feigenbaum, and J.  Lederberg, "Application                 
of Artificial Intelligence for Chemical Inference III.  Aliphatic                
Ethers Diagnosed by Their Low Resolution Mass Spectra and NMR                    
Data".  Journal of the American Chemical Society, 91:26 (December                
17, 1969).                                                                       
                                                                                 
 (20) A. Buchs, A. M. Duffield, G. Schroll, C. Djerassi, A.  B.                  
Delfino, B. G. Buchanan, G. L. Sutherland, E. A.  Feigenbaum, and                
J. Lederberg, "Applications of Artificial Intelligence For                       
Chemical Inference. IV. Saturated Amines Diagnosed by Their Low                  
Resolution Mass Spectra and Nuclear Magnetic Resonance Spectra",                 
Journal of the American Chemical Society, 92, 6831 (1970).                       
                                                                                 
 (21) Y.M. Sheikh, A. Buchs, A.B. Delfino, G. Schroll, A.M.                      
Duffield, C. Djerassi, B.G. Buchanan, G.L. Sutherland, E.A.                      
Feigenbaum and J. Lederberg, "Applications of Artificial                         
Intelligence for Chemical Inference V.  An Approach to the                       
Computer Generation of Cyclic Structures.  Differentiation                       
Between All the Possible Isomeric Ketones of Composition                         
C6H10O", Organic Mass Spectrometry, 4, 493 (1970).                               
                                                                                 
 (22) A. Buchs, A.B. Delfino, A.M. Duffield, C. Djerassi,                        
B.G. Buchanan, E.A. Feigenbaum and J. Lederberg, "Applications                   
of Artificial Intelligence for Chemical Inference VI.  Approach                  
to a General Method of Interpreting Low Resolution Mass Spectra                  
with a Computer", Chem. Acta Helvetica, 53, 1394 (1970).                         
                                                                                 
 (23) E.A. Feigenbaum, B.G. Buchanan, and J. Lederberg, "On Generality           
and Problem Solving:  A Case Study Using the DENDRAL Program".  In               
Machine Intelligence 6 (B. Meltzer and D. Michie, eds.) Edinburgh                
University Press (1971).  (Also Stanford Artificial Intelligence                 
Project Memo No. 131.)                                                           
                                                                                 
 (24) A. Buchs, A.B. Delfino, C. Djerassi, A.M. Duffield, B.G. Buchanan,         
E.A. Feigenbaum, J. Lederberg, G. Schroll, and G.L. Sutherland, "The             
Application of Artificial Intelligence in the Interpretation of Low-             
Resolution Mass Spectra", Advances in Mass Spectrometry, 5, 314.                 
                                                                                 
 (25) B.G. Buchanan and J. Lederberg, "The Heuristic DENDRAL Program             
for Explaining Empirical Data".  In proceedings of the IFIP Congress 71,         
Ljubljana, Yugoslavia (1971).  (Also Stanford Artificial Intelligence            
Project Memo No. 141.)                                                           
                                                                                 
 (26) B.G. Buchanan, E.A. Feigenbaum, and J. Lederberg, "A Heuristic             
Programming Study of Theory Formation in Science."  In proceedings of            
the Second International Joint Conference on Artificial Intelligence,            
Imperial College, London (September, 1971).  (Also Stanford Artificial           
Intelligence Project Memo No. 145.)                                              
                                                                                 
 (27) Buchanan, B. G., Duffield, A.M., Robertson, A.V., "An Application          
of Artificial Intelligence to the Interpretation of Mass Spectra",               
Mass Spectrometry Techniques and Appliances, Edited by George                    
W. A. Milne, John Wiley & Sons, Inc., 1971, p. 121-77.                           
                                                                                 
 (28) D.H. Smith, B.G. Buchanan, R.S. Engelmore, A.M. Duffield, A. Yeo,          
E.A. Feigenbaum, J. Lederberg, and C. Djerassi, "Applications of                 
Artificial Intelligence for Chemical Inference VIII.  An approach to             
the Computer Interpretation of the High Resolution Mass Spectra of               
Complex Molecules.  Structure Elucidation of Estrogenic Steroids",               
Journal of the American Chemical Society, 94, 5962-5971 (1972).                  
                                                                                 
 (29) B.G. Buchanan, E.A. Feigenbaum, and N.S. Sridharan, "Heuristic             
Theory Formation:  Data Interpretation and Rule Formation".  In                  
Machine Intelligence 7, Edinburgh University Press (1972).                       
                                                                                 
 (30) Lederberg, J., "Rapid Calculation of Molecular Formulas from               
Mass Values".  Jnl. of Chemical Education, 49, 613 (1972).                       
                                                                                 
 (31) Brown, H., Masinter L., Hjelmeland, L., "Constructive Graph                
Labeling Using Double Cosets".  Discrete Mathematics (in press).                 
(Also Computer Science Memo 318, 1972).                                          
                                                                                 
 (32) B. G. Buchanan, Review of Hubert DreyfusS" "What Computers CanS"t          
Do:  A Critique of Artificial Reason", Computing Reviews (January,               
1973).  (Also Stanford Artificial Intelligence Project Memo No. 181)             
                                                                                 
 (33) D. H. Smith, B. G. Buchanan, R. S. Engelmore, H. Aldercreutz and           
C. Djerassi, "Applications of Artificial Intelligence for Chemical               
Inference IX.  Analysis of Mixtures Without Prior Separation as                  
Illustrated for Estrogens".  Journal of the American Chemical Society            
95, 6078 (1973).                                                                 
                                                                                 
 (34) D. H. Smith, B. G. Buchanan, W. C. White, E. A. Feigenbaum,                
C. Djerassi and J. Lederberg, "Applications of Artificial Intelligence           
for Chemical Inference X.  Intsum.  A Data Interpretation Program as             
Applied to the Collected Mass Spectra of Estrogenic Steroids".                   
Tetrahedron, 29, 3117 (1973).                                                    
                                                                                 
 (35) B. G. Buchanan and N. S. Sridharan, "Rule Formation on                     
Non-Homogeneous Classes of Objects".  In proceedings of the Third                
International Joint Conference on Artificial Intelligence (Stanford,             
California, August, 1973).  (Also Stanford Artificial Intelligence               
Project Memo No. 215.)                                                           
                                                                                 
 (36) D. Michie and  B.G. Buchanan, "Current Status of the Heuristic             
DENDRAL Program for Applying Artificial Intelligence to the                      
Interpretation of Mass Spectra".  August, 1973.                                  
                                                                                 
 (37) H. Brown and L. Masinter, "An Algorithm for the Construction               
of the Graphs of Organic Molecules", Discrete Mathematics (in press).            
Also Stanford Computer Science Department Memo STAN-CS-73-361,                   
May, 1973)                                                                       
                                                                                 
 (38) D.H. Smith, L.M. Masinter and N.S. Sridharan, "Heuristic                   
DENDRAL:  Analysis of Molecular Structure," Proceedings of the                   
NATO/CNA Advanced Study Institute on Computer Representation and                 
Manipulation of Chemical Information, in press.                                  
                                                                                 
 (39) R. Carhart and C. Djerassi, "Applications of Artificial                    
Intelligence for Chemical Inference XI:  The Analysis of C13 NMR Data            
for Structure Elucidation of Acyclic Amines", J. Chem. Soc. (Perkin II),         
1753 (1973).                                                                     
                                                                                 
 (40) L. Masinter, N. Sridharan, H. Brown and D.H. Smith, "Applications          
of Artificial Intelligence for Chemical Inference XII:  Exhaustive               
Generation of Cyclic and Acyclic Isomers.", submitted to Journal of              
the American Chemical Society.                                                   
                                                                                 
 (41) L. Masinter, N. Sridharan, H. Brown and D.H. Smith, "Applications          
of Artificial Intelligence for Chemical Inference XIII:  An Algorithm            
for Labelling Chemical Graphs", submitted to Journal of the American             
Chemical Society.                                                                
                                                                                 
Publications Describing DENDRAL-Related Research But Not Funded By               
This Grant                                                                       
                                                                                 
                                                                                 
Mass Spectrometry in Structural and Stereochemical Problems CLXXXIII.            
A Study of the Electron Impact Induced Fragmentation of Aliphatic                
Aldehydes.  J. Amer. Chem. Soc., 91, 6814 (1969).  By R.J. Liedtke and           
C. Djerassi.                                                                     
                                                                                 
Mass Spectrometry in Structural and Stereochemical Problems - CXCVII.            
Electron-Impact Induced Functional Group Interaction in                          
4-Benzyloxycyclohexyl Trimethylsilyl Ether.  Org. Mass Spectrom.     y           
4, 257 (1970).  By Paul D. Woodgate, Robin T. Gray and Carl Djerassi.            
                                                                                 
Mass Spectrometry in Structural and Stereochemical Problems - CXCVIII.           
A study of the Fragmentation Processes of Some a,B-Unsaturated                   
Aliphatic Ketones.  Org. Mass Spectrom., 4, 273 (1970).  By                      
Younus M. Sheikh, A.M. Duffield and Carl Djerassi.                               
                                                                                 
Mass Spectrometry in Structural and Stereochemical Problems CCII.                
Interaction of Remote Functional Groups in Acyclic Systems upon                  
Electron Impact.  J. Org. Chem., 36, 1796 (1971).  By M. Sheehan,                
R.J. Spangler, M. Ikeda and C. Djerassi.                                         
                                                                                 
Mass Spectrometry in Structural and Stereochemical Problems CCVII.               
Fragmentation of Unsaturated Ethers.  Org. Mass Spectrom., 5, 895 (1971).        
By J. P. Morizur and C. Djerassi.                                                
                                                                                 
Mass Spectrometry in Structural and Stereochemical Problems CCVIII.              
The Effect of Double Bonds Upon the McLafferty Rearrangement of                  
Carbonyl Compounds.  J. Amer. Chem. Soc., 94, 473 (1972).  By                    
J.R. Dias, Y.M. Sheikh and C. Djerassi.                                          
                                                                                 
Mass Spectrometry in Structural and Stereochemical Problems CCXV.                
Behavior of Phenyl-Substituted a,B-Unsaturated Ketones Upon Electron             
Impact.  Promotion of Hydrogen Rearrangement Processes.  J. Org.                 
Chem., 37, 776 (1972).  By R.J. Liedtke, A.F. Gerrard, J. Diekman and            
C. Djerassi.                                                                     
                                                                                 
Mass Spectrometry in Structural and Stereochemical Problems CCXXII.              
Delineation of Competing Fragmentation Pathways of Complex Molecules             
from a Study of Metastable Ion Transitions of Deuterated Derivatives.            
Org. Mass Spectrom., 7, 367 (1973).  By D.H. Smith, A.M. Duffield and            
C. Djerassi.                                                                     
                                                                                 
The Carbon-13 Magnetic Resonance Spectra of Acyclic Aliphatic Amines.            
J. Amer. Chem. Soc., 95, 3710 (1973).  By H. Eggert and C. Djerassi.             
                                                                                 
The Carbon-13 Nuclear Magnetic Resonance Spectra of Keto Steroids.               
J. Org. Chem., 38, 3788 (1973).  By H. Eggert and C. Djerassi.                   
                                                                                 
Mass Spectrometry in Structural and Stereochemical Problems CCXXXVIII.           
The Effect of Heteroatoms Upon the Mass Spectrometric Fragmentation              
of Cyclohexanones.  J. Org. Chem., in press.  By J.H. Block, D.H. Smith,         
and C. Djerassi.                                                                 
                                                                                 
Mass Spectrometry in Structural and Stereochemical Problems CCXLII.              
Applications of DADI, a Technique for Study of Metastable Ions, to               
Mixture Analysis.  J. Amer. Chem. Soc., submitted for publication.               
By D.H. Smith, C. Djerassi, K.H. Maurer, and U. Rapp.                            
**********************************************************************           
Our past work in the area of mass spectrometric instrumentation has              
led to detailed knowledge of the performance capabilities of the mass            
spectrometer, implementation of some elements of computer control of             
the instrument and development of sophisticated programs to evaluate             
the performance of the spectrometer and to acquire and reduce mass               
spectra.  The instrument has received heavy use, with greatest emphasis          
being placed on high resolution mass spectral data* and evaluation of            
the GC/MS system.  We wish to upgrade these capabilities to provide              
a routine GC/HRMS system.                                                        
                                                                                 
                                                                                 
Our past work on applications of artificial intelligence to the                  
interpretation of mass spectra has given us a firm foundation on which           
to base broader explorations of molecular structure elucidation.  We             
intend to integrate state of the art spectroscopic data collection,              
especially GC/HRMS, with artificial intelligence techniques.                     
We wish also to explore additional techniques that would complement              
these in solving structure determination problems.                               
                                                                                 
Our recent work on finding mass spectrometry interpretation rules                
(theory formation) can provide additional unique capabilities for                
assisting with the problem solving.  We wish to continue this                    
research because it offers hope for a solution to the problem of                 
furnishing real-world knowledge to computer programs -- in particular            
to the computer programs that assist with structure elucidation.                 
This is a pressing problem in current AI research.  High performance             
programs, of which DENDRAL is most often cited, derive their power               
from large stores of knowledge.  Yet there are no routine methods for            
infusing such systems with knowledge of the task domain.  We believe             
our research in theory formation holds a key to the solution of this             
problem.                                                                         
                                                                                 
                                                                                 
We believe that much of our previous work can be immediately useful              
to scientists elsewhere.  We have frequently provided assistance to              
collaborators in the past, often uncovering interesting research                 
questions in the process.  We hope to make the instrumentation and               
computer programs broadly available on a routine basis.  As a first              
step, we wish to make available the most useful aspects of our                   
current system to the community of scientists using the NIH-sponsored            
SUMEX computer facility.  (See Section IC for a brief discussion of              
SUMEX.)